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INTRODUCTION 



Problem 

Measurement theorists have argued convincingly that the current crisis 
in education sterns from the lack of a scientific basis for writing achievement 
test questions, or items. This crisis has been intensified by an increased 
public demand for accountability in education and by interest in the use of 
tests for selection, placement, advancement, certification, and other important 
decisions that deeply affect people's lives. Although it is reasonable to 
expect that such decisions would involve reliable and appropriate tests, 
test specialists currently must work without the <iid of a systematic tech- 
nique for writing test items. Instead, for both criterion-referenced 
tests (in which an individual's performance is compared to a standard rather 
than to that of other individtials) and for traditional norm-referenced 
tests, they must rely on their intuitive skills or on those of experts to 
assess questions* merits. 

Even when item writers are given learning objectives that describe what 
is to be learned in terms of expected student performance under specified 
conditions and standards, they will not necessarily generate the same items 
or even items of similar quality. Current military guidelines for designing 
criterion-referenced tests for use in instructional systems (Swezey & Pearlstein, 
1974) refer to the "writing of test items for each learning objective," 
but do not provide detailed suggestions for writing such items. Item-^writing 
methods are needed that are (l) based on a logically and precisely defined 
relationship between the text and the test items written to assess learning 
from that text, (2) defined by a set of operations open to public , inspection, 
and (3) capable of producing items that can be easily replicated by many 
test developers. 

Use of such methods should allow tests to become more scientific instru- 
ments, and contribute to the advancement of instructional research, educa- 
tional evaluation, and the use of test data in forming public policy. 

Background 

Although theories and suggestions have been published concerning new 
item-writing methods, little specific research has been conducted to deter- 
mine either the technical quality of items written by such methods or the 
feasibility of their widespread use in education and training. Only a 
handful of civilian research studies, most of which are currently unpub- 
lished, have examined the technical and measurement qtialities of the new 
itei&^writing methods, such as those capable of being produced algorithmically. 
If these methods are to be used in military training and to reshape the 
everyday practices of educational testing in the United States, they must 
have a strong research base. 

There is an even more practical reason for interest in algorithmic 
methods of writing test questions: When students are to be retested aeveral 
ti les, particularly when using instructional systems that Involve the mastery 
learning model (Bloom, 1968)» multiple test forma muet be provided that are 
equivalent in both content coverage and difficulty. Although such test forms 
could be assessed and revise<i through field tests, much time and energy 
could be saved if forms of near equivalency could be produced algorithsh- 
ically. 

1 U 



Roid and Haladyna (1978), In comparing Item-writing techniques (e.g., 
Mlllman, 1974; Bormuth, 1970), found that one of two Item writers produced 
consistently more difficult test items from the same learning objectives. 
The resulting differences in test difficulty would have serious Implications 
for the criterion-referenced uses of such tests (e.g., those affecting 
pass-fall decisions). 

Anderson (1972, pp. 151-159) proposed various item-writing methods to 
test the learning of concepts, and principles. These methods rely on an 
analysis of examples and nonexamples of a concept or a principle and usually 
go beyond the verbatim wording used in the instructional materials. Tiemann, 
Kroeker, and Markle (Note 1) have devised plans for sampling examples and 
nonexamples of concepts in both teaching and testing settings. 

Bormuth (1970) proposed operationally defined item-vrlting rules for 
transforming segments of prose material to obtain items that test recall of 
such matierial. Specifically, he proposed rules for deriving items from 
sentences, and from the relationships between sentences (pp. 39-^55). An 
example of sentence-derived items are those produced by the "wh-transfor- 
matlon,'' which requires the writer to inspect all sentences in the instruc- 
tion and to substitute a "wh-pro** word such as who, what , or where for, 
say, the subject of each sentence. For instance, '*The boy rode the horse" 
could be transformed to "Who rode the horse?" Items derived by this method 
are particularly useful because they can be written to cover each part of 
a sentence and tailored to either the multiple-choice or fill-in format. 
Sentence-derived items can also result through the use of paraphrasing; 
that is, by replacing substantive words in a sentence with others having 
the same meaning. 

Items can be derived from the relationships between sentences by ques- 
tioning the cause of a described action or result. For instance, the sen- 
tences ''Jim hurt his foot," '*He was cleaning his gun," and "His gun accldently 
fired" can be examined for implied causation, resulting in the question 
"What caused Jim's hurt foot?" 

Finn (1975) extended Bormuth' s work by de\eloplng a question-writing 
algorithm for learning from prose. The principle steps in this algorithm 
are described in the following paragraphs. 

1- Computer Analysis of Passage or Test . The passage or text is analyzed 
by keypunching all words and entering them in a computer program that (a) 
counts the number of times that each word appears in the passage (text fre- 
quency) and (b) calculates its standard frequency index (SFl) , which is a 
numerical estimate of how often the word appears in a large corpus (five 
million words) of American English (Carroll, Davles, & Richman^ 1971). 
The SFI ranges from 88.6 for the word "the" to 02.5 for the word '*lncarna- 
tion" (i.e., the average student is likely to encounter the word *'the" once 
in every 10 words of his schoolbook reading and the word *'incarnatlon" less 
often than once in every billion words. 

2. Identification of Candidate Sentences for Transformation into Items . 
Words having a low SFI — that is, they are relatively rate in American English — 
are called high information words. The sentences in which these words appear 
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can be regarded as candidates for transformation into questions thrt tap 
important information in the passage. 

3. Selection of High Information Words for Use as Question Words . 
High information words usually are difficult for subjects to guess if they 
are deleted from a prose passage, which is the method used in cloze tests 
(Culhane, 1970). In such tests, segments of prose are presented to a sub-* 
ject, usually with every fifth vord deleted, and he is tasked to supply 
the missing i^rds. The ease with which he supplies a missing word is a 
measure of the amount of information it provides. 

Finn (Note 2) found that the cloze easiness of a word can be 
predicted by the two indices derived from computer analysis of a passage; 
that is, ijord frequency and SFI. A i^ord having a low SFI is typically high 
in information. liowever, if this i^rd appears frequently in the passage, 
its information value will be diminished bi^cause subjects will supply it 
more easily in a cloze test following reading of the passage. In other 
words, repetition of words, even if they are rare in American English, loners 
their information value. Therefore, Finn concluded that good candidate 
question i^rds must have a low SFI and must occur only once in a prose 
passage. 

Not all parts of speech"even if they meet the above criteria—are 
equally good candidates for question i^rds. Verbs and adverbs pose par- 
ticular problems. For example, the sentence, '^inn echoed the concern of 
Bormuth,*' when transformed to "What did Finn do to the concern of Bormuth?" 
is clumsy and less important than "Who echoed the concern of Bomtuth?** 
After considerable effort to produce questions from verbs and adverbs, the 
authors of this report concluded that the most promising question words 
are adjectives, nouns, or phrases including an adjective or a noun. 

Adjectives and n6uhs~ca'n be further classified by type. For example, 
either may be part of a noun phrase, and nouns may be possessive. If an 
algorithm is to be fully defined, then, the classifications of the question 
words within parts of speech must be specified to eliminate ambiguity for 
the item writer who selects the words. 

4. Sentence Analysis . Once a question word has been selected* the sen- 
tence in which it occurs is analyzed or diagrammed to identify its impor- 
tant parts (e.g., subject, verb, and object). This procedure is advantageous 
for two reasons. First, parts of speech that are least promising for ques- 
tion words (i.e. > explicatives* functional verbs* articles, and prepositions) 
either appear as parts of phrases or not at all. Second, the number of 
questions possible for a given sentence becomes a function of the number 

of case phrases and nonzero verbs in the sentence rather than the number of 
words . 

5. Sentence Transformation . The next step is to transform the sentence 
into a question by replacing the question word, usually an adjective, a 
noun* or a phrase including an adjective or a noun, with a wh*word. Where 
several i^rdings are possible, an attempt is made to stay as close aa pos* 
sible to the wording of the original sentence. Sentences may also be trana- 
formed by replacing pronouns with their appropriate nouns and references 
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to previous sentences with clauses or phrases from those sentences How-- 
ever, this method does not produce 100 percent agreement among item writers* 



6* Algorithmic Generation of Foils (response alternatives) * The first 
step in an algorithmic generation of foils is to classify the correct alter- 
native so that possible foils can be obtained from a list of words similarly 
classified* The most logical source of foils would seem to be the prose 
passage itself but, in some cases, published lists of words (e.g«, Carroll 
et al*, 1971) may be useful. 

Objective 

The objective of the present effort was to refine procedures for choosing 
question words for us3 in wh-^transf ormations of instructional sentences and 
for algorithmically generating multiple-choice foils* Multiple-choice 
testing is the^most common testing method used in education and training. 
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APPROACH 



Item Development 

A prose passage on Insect development, which was written for approxi- 
mately the high school level, was selected for use in this study. This 
passiige is provided in the appendix. Items (stem and foils) to test learn- 
ing from this passage were then developed using the following procedure: 

1. All of the words in the passage mre keypunched into a computer pro** 
gram to determine their standard frequency index (SFI) and text frequency. 
Nouns and adjectives having an SFI of 60 or less were identified, since they 
appeared to be the best candidates for question words. These nouns and 
adjectives were then further classified to identify those that (1) appeared 
only once in the text and, (2) had a high text frequency. For the remainder 
of this report, these classifications are referred to as rare singletons 
and keywords . 

2« Twenty sentences were selected fo'r transformation into items. Five 
of these sentences included rare singleton nouns; five, keyword nouns; five, 
rare singleton adjectives; and five, ke3rword adjectives. These nouns and 
adjectives are listed in Table 1* 



Table 1 
Question Words Selected 



Nouns 




Adjectives 




Rare Singleton 


Keyword 


Rare Singleton 


Keyword 


Ins tars 


Insect (8) 


Plant-feeding 


Immature (3) 


Cicadas 


Insects (20) 


Pupal 


Incomplete (2) 


Silverfish 


Metamorphosis (9) 


Spine-1 ike 


Nymphal (2) 


Wasps 


Egg (8) 


Self-made 


Aquatic (2) 


Appetites 


Adult (8) 


Worm-like 


Distinctive (2 



Note . The number appearing in parentheses behind keywords represents text 
frequency* 



3« The selected sentences were transformed (using the wh- method) into 
multiple-choice items by four i-tem writers (Author Finn and three graduate 
students from the State University of New York at Buffalo)^ After working 
as a team to ensure that items produced were similar, the writers produced 
items independently. For each of the 20 sentences selected, each writer 
produced two items: The stems for the two items were identical but the 
foils or alternatives for one item were generated informally by the writer 



and those for the second item, by an algorithmic method. For example, the 
rare singleton ^'silverf ish*' appeared in the following sentence: *'The most 
primitilve insects, such as the silverfish, do not go through metamorphosis." 
For this sentence,, one writer produced the following stem: *'The most primi- 
tive insects, 6uch as what, do not go through metamorphosis?" The first item 
formed using this stem included foils produced informally by the author, 
in this case: 

1 . Butterflies 3. Canines 

2. Silverfish 4. Cicadas 

The s^econd item inclv*ded foils generated algorithmically , in this case: 

1. Silverfish 3. Individuals 

2 . Females 4 . Wasps 

This process resulted in 160 multiple-choice items: 20 selected sentences 
transformed by four item writers using two foil methods. For a given sentence, 
the stems and foils produced by the writers were comparable but not identical. 
However, the foils produced algorithmically were the same across items/writers. 
Examples are provided in the appendix. 

Algorithmic Foil Generation 

In generating foils algorithmically, the writers experimented with a 
method based on the Word Frequency Index (Carroll et al. , 1971), which pro- 
vides the SFIs for more than five million words. Question words (e.g., silver- 
fish) were located in the index and those in the index having similar SFIs 
were located for possible use as foils. However, the index proved to be an 
unacceptable source for this particular application; thus, an algorithmic 
method of foil construction was developed that extracted foils from the 
prose passage itself, and variations of that algorithm were developed for 
nouns and for adjectives. 

The rare singleton and keyword nouns selected as question words were 
classified semantically using the method developed by Fredericksen (1975), 
which is shown in Figure 1. For example, using this method, the singleton 
noun "silverfish" would be classified as a concrete, processive, animate 
noun (41). Other rare singleton and keyword nouns in the passage that also 
met this classification were then selected at random to create foils. Those 
selected as foils for "silverfish" using this method were "females," "indivi- 
duals," and "wasps," as indicated above. 

All rare singleton and keyword adjectives in the prose passage (not Just 
those selected as question woLds) were classified using semantic differential 
techniques (Nunnally, 1967, pp. 536-538). In research using these techniques, 
adjectives are typically classified based on their (1) evaluation (e.g., 
good or bad), (2) potency (e.g., strong or weak), (3) activity (e.g., fast 
or slow), and (4) familiarity (e.g., simple or complex). In addition to 
these four categories, rare singleton and keyword adjectives in the prose 
psssage were classified according to whether or not they could be considered 
as "technical" words. This latter category is particularly useful in tech- 
nically oriented material, particularly for grouping adjectives that relate 
to a certain noun. 




Processive 



Concrete 



(+ change) 



Aniinate 

(anlnal , 

man, insect, John) 



Nonsymbollc 
Inanlmte 
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Static 



(- change) 



Symbolic 

(book, letter, 
picture) 



(rock, house, 
shovel) 



Abstract 
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. Symbolic 



(movie, game, song, 
speech) 
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Nonsymbollc 



(wind, heat, noise, 
pressure) 
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Processive-Abs tract 



(love, hope) 



Static-Abstract 



(length, pounds, 
size) 
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Figure 1, Fredericksen*s semantic; classification of nouns* 



After these adjectives were classified according to the five categories 
noted above, they were subjected to an analysis of familiarity, using the 
Dale-Chall (1948) list of 3000 familiar words. If they were included in 
that list, they were not considered for use as foils because they were too 
familiar and, thu3, too easy. Approximately 50 adjectives passed this screen 
and qualified for use as foils. Foils for adjective question worda were then 
developed by randomly selecting those having the same classification (i.e., 
as to elevation, potency, etc.). For example, those selected for the rare 
singleton "pupal" were **nyinphal,*V "pa"sitic," and ^*ineect*' (see appendix). 

Test Construction and Administration 

From the 160 items, eight 20-item test forms were developed* £ach test 
included five items generated from rare singleton nouns; five* from keyword 
notins; five* from rare singleton adjectives; and five* from keyword adjec- 
tives. In addition* test forms were organized so tliat each Included five 
items from each of the four item writers, 10 items with foils generated 
informally by the item writers, and 10 items with foils generated algor- 
ithmically* The internal consistency reliability estimates (Kuder-Rlc hard son 
Reliability Formula Number 20) averaged ,$3 for these test forms. 

The eight forms were administered to 24 sttidents from the Oregon College 
of Education before (pretest) and after (i>ostteflt) they had studied the 
prose passage on insect development* For both pretest and posttest, three 



subjects were randomly assigned to each of the eight test forms; however, 
care was taken to ensure that the pretest and posttest forms administered 
to each student were different • 

Analyses 

Average pretest and posttest Item difficulties, as determined by the per- 
centages of students who answered the Item correctly, were computed for Items 
In the following categories: (1) those produced by each of the four writers, 
(2) those derived from each of the four types of question words, and (3) 
those with foils either generated Informally by the writers or algorlthmi- 
cally* It was hypothesized that Items generated frooi rare singleton nouns 
and adjectives would provide the best Instructional sensitivity, as deter- 
mined by tlie difference between their pretest and posttest item dif f Icultles^ 

Due to possible fluctuations in item difficulty because of the small 
sample size, a nonparametrlc analysis of variance (ANOVA) (Wilson, 1956) 
was used to examine differences in item difficulties between (1) the four 
item writers, (2) the four question word types, (3) the two foil types, and 
(4) the two test occasions. 

With 160 items administered on two occasions, the analysis had 320 data 
points and five replications per cell* The nonparametrlc ANOVA is based 
on identifying the number of item difficulties that fall above or below 
a grand median; thus, contingency tables were created to display the number 
of observations falling above or below the median in each cell of the fac- 
torial design, as suggested by Wilson (1956) • The chl-square statistic 
for the contingency table, created by using ail four factors in the design, 
was then decomposed into sources of variation in the same manner that a 
total sum-of-squares Is decomposed in a parametric ANOVA. The decomposition 
of chi-square was shown originally by Rao (1952, pp. 192-205). 

The ANOVA is also useful for determining items' instructional sensitivity: 
A significant main effect for the pretest-oosttest factor would indicate 
that pretest difficulties were significantly different from posttest dif- 
ficulties for all Items^ A significant interaction effect involving the 
pretest-posttest factor would Indicate that certain types of items differed 
in the pattern of their pretest and posttest difficulties. 



RESULTS 

Average Item Difficulty and Instructional Sensitivity 



Table 2, which provides average Item difficulty and Instructional sensi- 
tivity, indicates that Items derived from rare singleton nouns showed a 
good pattern of pretest and posttest difficulty (56.2 to 88.3%), and had the 
hlghesc mean Instructional sensitivity (32.1%). Items derived from rare 
singleton adjectives showed a pattern of average Item difficulties similar 
to that of rare singleton nouns (54.4 to 79.3%); however, these Itams were 
somewhat more difficult than the former on the posttest. Also, the mean 
Instructional sensitivity for rare singleton nouns was not as high as that 
for keyword adjectives (24.9 vs. 29.6%). Thus, the hypothesis that rare 
singleton nouns and adjectives would provide the best Instructional sensi- 
tivity was only partly supported. 

Table 2 also shows that Items derived from keyword nouns were slgnlfl- ' 
cantly easier on the pretest than mre Items derived from the other question 
words. An examination of the text sentences In which these words appeared 
showed that they were typically Introductory and, thus, very general. Por 
example, the keyword noun *'lnsects*' appears In the very first sentence: 
*'The life of most Insects Is short but active." Items derived from such 
general statements usually concern r;ommon knowledge that students can answer 
correctly without having to read the prose passage. Further, Items based 
on keyword nouns were easier on the posttest than the others, although not 
to a significant degree. This finding supports the hypothesis (Finn, Note 
2) that the Information content of words (even If they are rare In American 
English) Is reduced by their high text frequency. As shown In Table 1, 
keyword nouns used In this study had a text frequency ranging from 8 to 
20, 

Keyword adjectives produced the most difficult Items on the posttest, 
a finding which Is not consistent with the above hypothesis. The reason 
for this apparent Inconsistency Is shown In Table 1: With text frequencies 
of two or three, the keyword adjectives were very close to being rare single- 
tons . 

The two types of foils proved to be almost equally effective for 
learning, as evidenced by the similarity In posttest Item difficulty. How<- 
ever, those that were Informally generated by the Item writers were con- 
siderably harder on the pretest (I.e., students were not able to guess the 
correct answer as often when such foils were used), and had a much higher 
Instructional sensitivity than algorlthmlcally generated foils (30.5 vs. 
19;4). This Is understandable, since any automated method Inevitably will 
produce some implausible foils. A skilled Item writer, on the other hand, 
can choose foils that fit the meaning and semantic qualities of the Item 
stem and the correct foil. 
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Table 2 

Average Item Difficulty and InsCrucCional Sensitivity 



Pretest Post test Instructional 
Sensitivity 



Item Category 


Mean 


S.D. 


Mean 


S.D. 


(Posttest Minus Pretest) 


Item Writers 












#1 (N » 5) 


62.9 


37.8 


81.9 


28.5 


19.0 


nz (N - 5) 


65.0 


36.4 


85.8 


28.3 


20.8 


*3' (N =■ 5) 


49.5 


36.7 


82.9 


30.9 


33.4 


#4 (N - 5) 


57.7 


33.8 


83.7 


28.1 


26.0 


All Writers (K - 20) 


58.8 


36.2 


83.6 


28.9 


24.8 


Type of Word 












Rare Singleton Noun (N - S) 


56.2 


34.4 


88.3 


21.5 


32.1 


Keyword Noun (K = 5) 


77.1 


31.2 


89.6 


25.0 


12.5 


Rare Singleton Adjective 












(N » 5) 


54.4 


41.3 


79.3 


32.2 


24.9 


Keyword Adjective (N 5) 


47.5 


31.9 


77.1 


33.6 


29.6 


All Types of Words (N - 20) 


58.8 


34.7 


83.6 


28.1 


24.8 


Type of Foil 












Writer Generated (K « 10) 


54.7 


37.4 


85.2 


29.0 


30.5 


AT-gorlthmically generated 












(K - 10) 


62.9 


35.0 


82.3 


28.5 


19.4 


Both Types of Foil (K - 20) 


58.8 


36.2 


J?3.8 


28.8 


24.9 
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AnalvaiB of Average Item Difficulty 



The results of the nonpsrsmetrlc snslysls of vsrlsnce on sversge Item 
difficulty sre presented In Tsblc 3. The main effect for teat occsBlonB 
(D) was str >ngast, which Indlcstes thac, scross sll types of Items, s 
higher percentsge of students snswered Items correctly on the posttest than 
the pretest (83*5 vb. 58. 8X on Table 2). In other words, most items showed 
Instructional sensitivity: the students did lesrn from reading the pssssge* 
Further, the oversll pretest Item difficulty of 58*8 percent Indlcstes thst 
over half the students were sble to guess the correct snswer to most questions 
without reading the pssssge. Thus, the Items developed could not be rsted 
"excellent^; with four-sltematlve, multlple^cholce items, such ss those 
used In this study, "excellent" Items should show pretest difficulties nearer 
to the level of rsndom guessing; that Is, 25 percent. 



Tsble 3 

Results of s Nonpsrsmetrlc Analysis of Variance on 
Item Difficulties for Items In Each Category 



Source of Variation 


Chi- Square 


df 


A (Writer a) 


2,5X 


3 


B (Word typea) 


X6.32 


3* 


C (Foil typea) 


D .31 


1 


D (Preteat va, Poatteat) 


45,53 


1* 


AB 


8.24 


9 


AC 


1.28 


3 


AD 


2.86 


3 


BC 


2.07 


3 


BD 


2.25 


3 


CD 


3.7X 


1 


ABC 


7.97 


9 


ABD 


18.29 


9** 


ACD 


8.40 


3** 


BCD 


4.01 


3 


ABCD 


12.45 


9 


Total 


134.20 


63 



*p < tOOl 
**p < t05 



There was also a main effect for word type (B)v This effect vas caused 
by the fact that Items derived from keyword nouns were significantly easier 
on the preteat than other Items* The reason for this was discussed previously* 
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As shown, there were no main effects for writers (A) or foil types 
(C) or significant two-way Interactions* However, there were two signifi- 
cant three-way Interactions: (1) ABD (writers by word type by pretest- 
posttest) and (2) ACD (writers by foil types by pretest-posttest)* Inspec- 
tion of the Item difficulties In each cell for the ABD Interaction Indicated 
the following variations between writers: 

1« Writers #2 and #4 wrote keyword noun Items that were much easier 
for students to guess correctly on the pretest than those written by 
Writers #1 and tf3. 

2« Writer if2 wrote rare singleton noun Items that were much easier 
for students to answer correctly on the posttest than did the other writers* 

3, Writer ifA wrote "excellent" rare singleton adjective Items, as 
Indicated by the high Instructional sensitivity they showed from pretest to 
posttest* 

Examination of the ACD Interaction revealed that Writer #3 generated 
excellent foils, as evidenced by the high Inst^^uctlonal sensitivity Items 
with such foils showed from pretest to posttest* A comparison of foils 
generated by Writer #3 with those generated by other writers showed that he 
had selected foils that were more (1) logically related to the passage, (2) 
difficult, and (3) semantlcally parallel to the correct answer* 

Although the effects of the significant three-way Interactions found 
In this study were not as strong as the main effects for test occasion or 
word type, they do suggest two Important possibilities: 

1. The skill of Item writers will vary to the extent that a good Item 
writer can produce foils that are better than those produced algorlthmlcally. 

2. An algorithmic foil-generating method can smooth out differences 
between Item writers with different capabilities. 
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CONCLUSIONS 



The concept of using a computer*ba8ed algorithm to analyze prose instruc* 
tional materials and to Identify high information words (i«e«, those that 
are rare in Aiii<±rican English) appears to be vorlcable* High information 
nouns or adjectives identified as rare singletons (those occurring only 
once in a passage) are apparently good candidates for question words* High 
information adjectives identified as keywords (those occurring more than 
once in a passage) also appear to be good candidates for question words, 
providing they occur only two or three times* In contrast, keyword nouns 
apparently are not good candidates, particularly when they occur in general 
introductory sentences* 

The methods used in this study to generate foils algorithmically for 
multiple-choice versions of sentence-derived items appear to be feasible* 
Although 'foils generated in this manner may be somewhat easier than those 
generated by item writers, they still appear to produce significant instruc- 
tional sensitivity — a shift in difficulty from pretest to posttest when 
instruction is provided between testing sessions* 
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RECOMMENDATIONS 

1. Rare singleton nouns and adjectives and keyword adjectives that 
occur Infrequently In instructional material should be used to select sen* 
tences from prose passages for transformation Into questions that measure 
reading comprehension. Keyword nouns should not be used, particularly when 
they occur In general Introductory sentences. 

2. Methods of algorlthmlcally generating foils for multlple-^cholce 
versions of sentence-derived questions should be further refined and applied 
In a variety of subject matter areas. 
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APPENDIX 



THE PROSE PASSAGE USED IN THE EXPERIMENT 
AND EXAMPLES OP ITEMS PRODUCED FROM TEXT 



PROSE PASSAGE USED IN, THE EXPERIMENT 



4. INSECT DEVELOPMENT 



The !ife of most insects is short bu( active. Very 
few insecU bave a lif(>'Span of more th^n a year. 
By a lifo'^pan wc mean the time from whea the 
ef^ is VMd tn when the fully developed 4idutt dies. 
Let's look ill wh^i happens during this [leriod. 

All insecU develop from eggs. En most cases 
these eggs hatch outside tlie body of the female. 
In th^' few cases in which Ihe eggs liatch inside 
the female the young are bom ^alive.** These Ui- 
secis, such as the aphids, are said to be viviparous. 
(vy-v<p'-ah*rus). 

Inst^is thit hatch from eggs after they have 
been laid are said to be oviparous (oh-vlp'^-ah^nis), 
Mo5t injects are oviparous. In mo$t cases each 
i't£g produces a single immature insect However, 
in oertain species of parasitic w^isps (encyrtids), 
Hie ei;g may produce two or more young. 

Most insi^x OKgs are very distinctive. The siie, 
shape, or color of the egg is different, in most 
cases, for each species of insect. This enables a 
person who has made a study of these egg$ to 
identi£> the (n5ec1 that laid them abnost easily 
a$ if he had seen the adult. 

Most insect eggs are laid in a place that Mritl 
provide i*itlier protection or food for the young. 
Prote ction is especially important to those insects 
that Overwinter in the egg stage. Overwintering 
means that the adult insect lays its eggs in the 
late Slimmer or early fall. The eggs then are dor- 
mant until the next spring when they hatch. Most 
of the adults of these species are kilted by the 
first frost. However, the hatching of the^e eggs in 
the spring produces new individual! to carry on ^ 
the species. 

Most plant -feeding insects instinctively lay thetr 
eggs tm plants that the young feed on. This iA- 
creases the immature insects' chances of survive 
If this field r>f investigation interests you, the ^tudy 
and photography or insect eggfi might malce a 
gF*od profeci. 

After rciiehmg the pn>per ttdge of developmeiit, 
the (fti! will hatch. The young insect car use a 
number of ways to get out of the egg. Some insects 
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chew their way out. Others have special spifielilce 
structures, called egg-bursters, which cut through 
the shell. There are some eggs which have special 
weak spots in them. The young insect escapes 
from these either by wriggling oi ^ taking £n air 
and bursting the shell with internal pressure. 

After the Egg 

After hatching, all insects, except the most 
primitive, go through a series of steps in develop- 
ment. These steps are called meiamorphotU. The 
word metamorphosis comes from two Greek 
words: meta. meaning to change, and morpho, 
meaning form. Therefore, metamorphosis means 
a change in form. This change in form occurs in 
two different ways. These two ways are called 
complete and incomplete metamorpWts. The 
most primitive insects, such as the silverfish, do 
not go through metamorphosis. AVhen they hatch 
they k>ok like their parents in every way except 
that they are smaller. Their development consists 
of growing larger and becoming able to repro- 
duce. 

Incomplete Metamorphosis 

Insects which show this type of metamorphosis 
have young which took very much like the adults 
of the species. These immature insects are called \ 
nymphs. With the exception of some aquatic spe- 
cies, the principal differences between the nymphs 
and adults are in si2e and the presence of wings 
(see illustration at the Hght)H 

Now think back to the description of the phy* 
lum to which insects Mcng, Arth*opoda^ Remem- 
ber« one of the characteristics of these animals Is 
a hard outer covering called an ttotktkton. The 
CKOskeletoh is made of a nonliving fubstance 
called chitin (ki^Hn). ChJtin Is hard and^stiff and 
has very little ''stretch.*' Inside the exoskeletoii 
there is very little room for growth* 

In order to grow, the nymph must escape this 
setf-made prison. It does this by secreting a new 
eKO«kel«ton under the old one. When this new 
skin is complete the old skeleton splits down the 

It 



Note. Special permission granted by What Insect Is That? published by 
Xerox Education Publications, (c) 1965 Xerox Corp. 
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back and the mj*ct walks away aixS leaves it be- 
hind. Voti have probably seeo some of these dis- 
carded skins, called casts, on tree tninks. 

For a time after the insect discards its old skin, 
the new exoskeleion is soft. This allows the exo* 
skeleton to eicpand and make room for further 
.growth. 

Each of the periods between molts is called an 
ijittor. Some nymphs go through as tn^ny as eight 
or more instars bcfurt; emerj^ing ;k\ adkdts. 

Aquatic species th^t undergo incomplete meta- 
morphosis must go through one more step in de- 
velopment. As nymphs they breathe hy means of 
gills. These gills must be replaced by air-breatfa^ 
ing organs in the adult stage. This is dorw in the 
last nymphal instar. Wt>en it is time for the adult 
to emerge, the nymph rises to the surface and 
molts. The fully developed adult steps out of the 
final nymphal skii^ with fully developed organs 
for breathing air. 

Complete Metamorphosis 

This is the type of metamorphosis that most 
pf^ple are familiar with. Butterflies and moths 
has** complete melamorphusis. There are four 
distinct stages^ egg. larva, pupa, and adult. Since 
the adult's main activity is producing eggs, and 
Tm sure you know what these are. we will spend 
our lime studying the larva and pupa. 

The larvaes main job in life is to eat and grow. 
They hase huge appetites. Larvae are very differ^ 
en t from the adults. They do not have compound 
eyt^. wind's, and usually have chewing mouth 
parts even in those orders where the adults have 
suckmg mouth parts. 

A larva may continue to eat and fi;row all sum- 
mer. As cold weather appmaehes, it may build a 
cocoon and pass into the pupal stage. 

Most of these insects pass the winter inside the 
cocoon. Because no activity is visible at this time, 
the pupa has been falsely called a '^resting stage. 
Actually a great deal of activity is ;i;oin;; on. The 
wormlike larva is changing into a fully developed 
adult. When the weather is warm a^^in, this adult 
emer;;es from 'the cocoon, mates, lays eggs, and 
starts the whole process over again. 
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Let's Get Together 

Most ins^-'cts reproduce Jeitually. This means 
that« to have th:it will hatch, a male and 
female uf ihi* \peeies must mate. The (pU'j^tion is^ 
How do they find t*aeli other? 

It Una knowTi for years th;il some (if the 
soumls mink' hy mckets iind eitadas were a tyyn" 
of niatini; eall. It is eilsy to f;ee how these inseels 
^ri to^etlier. WnX wlud about the iMscvt.s that do 
ImM tluke i)c><\e; biittetilies. for itlsliMKi'^ 

It has htvii diseovere*! tliat the fe<iuileK tif thes<* 
\))c\H's ^ive a itistiiietise ixtor. Tills ixlor is 
detet table by mule in?i(.*t ts over ^reiit distuntvs. 
Thc^ mjl(^ lorlows tbis sivnt trjdl l>iiek to the fc^' 
male. 

This brings to miml an intcri'stin^ eH)>eriment 
\i><i mi^ht try. A friend f^f mine imc« caimht a rc- 
eently cMni r^tM:! f^^male Prnmcthcii moth. He put 
the f^'tnale in a st reen tajje and set it outsfdeliis 
windi^w. In less than tw> hours tlicre were mort» 
th«n Ivirnty mides han^in^ on the outside of the 
ea^e. Wliy don t you try tliis with other kinds of 
tns4*ets? It wotdd make a ^rertt seieiie^" projtxi. 

.Seientv has Ms4*tl tbe dijicovery of tlie.se (xlors ti> 
help ehmin^ite i in dt'si riddle inseets. [t was found 
that fcMnale eoekroiiehes ^ive off iiti iittrrtetive (t*^ 
niHle c^iekriKiehcs) odor. Sek^tists have been able 
to reprmluee this si.-cnt and iMve tist'd it to attract 
males tf> traps. 

Exercises 

How W«lt Dtd You ReadT^ 

h Name ond de^cftl>« tt>e three types of dev«k>pfnertt 
rnsects can gO i uSh. 

2. What advantage ts there tn insect eggs t>etng laid on 
certain plants? 

3^ What IS metamorphosis? What are the differences 
between complete and incomplete metamorphosis? 

4. What processes take^place during the growth of in- 
sects? 

5, Can you think of any advantages to some insects in 
being born "alive' 7 

R«ad A LMte More 

1. Lemmon. Sh. All About Moths and Butterfltes. 
I^ew York: Random House. 1956. 
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EXAMPLES OF ITEMS PRODUCED FROM TEXT 

1. Ke3™ord Noun — Metamorphosis . 

a* Text Sentence(s): After hatching, all insects, except the ntost primitive, 

go through a series of steps in development. These 
steps are called metamorphosis . 

b. Items (stem and Foils) Produced by Item Writers; 

(1) What are the series of steps in insect development called? 

(a) Maturation (c) Symbiosis 

(b) Metamorphosis (d) Meitosis 

(2) What are the steps insects go through in development called? 

|[a) Metamorphosis (c) Larva 

(b) Arthropoda (d) Pupa 

(3) What are a series of steps in development called? 

(a) Reproduction (c) Metamorphosig 

(b) Larvae (d) Changes 

(A) What are the series of steps in insect development called? 

(a) Encrytid (c) Arthorpoda 

(b) Instar (d) Metamorphosis 

c. Foils Produced Algorithmically : 

Growths 

Metamorphosis 

Types 

Activities 

2, Rare Singleton Noun — Silverfish . 

a. Text Sentence: The most primitive insects, such as the silverfish , do^ 

not go through metamorphosis4 

b* Items (stem and Foils) Produced by Item Writers: 

(1) What does not go through metamorphosis? The 

(a) Moth (c) Nymphs 

(b) Silverfish (d) Butterfly 

(2) What do not go through metamorphosis? The most primitive insects^ 
such as 

(a) Silverfish (c) Spiders 

(b) Termites (d) Moths 

(3) What insects do not go through metamorphosis? The primitive, such as 

(a) Eggs (c) Chitin 

(b) Silverfish (d) Butterflies 



(4) The most primitive insects, such as what, do not go through metamorphosis 

(a) Butterflies (c) Canines 

(b) Silverfish (d) Cicadas 

c* Foils Produced Algorithmically ; 

Silverfish 
Females 
Individuals 
Wasps 

3* Keyword Adjective—Iranature/ 

a« Text Sentence: In most cases, each egg produces a single immature insect. 

b. Items (stem and Foils) , Produced by Item Writers: 

(1) What does ea,ch egg produce in most cases? A single 

(a) Immature insect (c) Adolescent insect 

(b) Adult insect (d) Mature insect 

(2) What does each egg produce in most cases? A single 

(a) Ovipajrous insect (c) Mature insect 

(b) N)nnphal insect (d) Immature ix^sect 

(3) In most cases, what does each egg produce? A single 

(a) Dormant insect (c) Adult insect 

(b) Adult insect: (d) Immature insect 

(4) What does each egg produce? A single 

(a) Immature insect (c) Round insect 

(b) Mature ubsect (d) Adult insect 

c. Foils P^roduced Algorithmically: 

Complete insect 
Distinct insect 
Immature insect 
Incomplete insect 

4. Rare Singleton Adjective — Pupal . 

a. Text SentenceCs): A larva may continue to eat and grcrw all summer. As 

cold weather approaches, it may build a cocoon and 
pass into the pupal stage. 

b. Items (stem and Foils) Produced by Item Writers: 

(1) What may a larva do as the cold weather approaches? Build a cocoon 
and pass into the 

(a) N)nnphal stage (c) Pupal stage 

(b) Parasitic stage (d) Molt stage 



(2) As cold weather approaches* a larva may build a cocoon and pass 
Into what? 



(a) Infant stage 

(b) Adult stage 



(c) Butterfly stage 

(d) Pupal stage 



(3) Into what stage may the larva pass as cold weather approaches and 
It builds a cocoon? The 



(4) As cold weather approaches* what my a larva do? Build a cocoon 
and pass Into the 



c» Foils Produced Algorithmically : 

Pupal stage 
Nymphal stage 
Parasitic stage 
Insect stage 



(a) Larval stage 
(c) Pu pal stage 



(c) Skeletal stage 

(d) Nymphal stage 



(a) Pupal stage 

(b) Hibernation stage 



(c) Dormant stage 

(d) Resting stage 
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